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SUMMARY 

During the surveillance of influenza pandemics, underreported data are a public health challenge 
that complicates the understanding of pandemic threats and can undermine mitigation efforts. 
We propose a method to estimate incidence reporting rates at early stages of new influenza 
pandemics using 2009 pandemic H1N1 as an example. Routine surveillance data and statistics 
of travellers arriving from Mexico were used. Our method incorporates changes in reporting 
rates such as linearly increasing trends due to the enhanced surveillance. From our results, the 
reporting rate was estimated at 0-46% during early stages of the pandemic in Mexico. We 
estimated cumulative incidence in the Mexican population to be 0-7% compared to 0-003% 
reported by officials in Mexico at the end of April. This method could be useful in estimation of 
actual cases during new influenza pandemics for policy makers to better determine appropriate 
control measures. 
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may not even be reported at all. A previous study has 
shown that official surveillance only reveals a small 
proportion of actual infections during influenza pan- 
demics [l]-in some instances the consultation rate in 
influenza-like illness case-patients was no more than 
50%. Furthermore, cases increase exponentially during 
the initial stage of an outbreak and the limited capacity 
of surveillance systems, such as limited serological tests, 
can also lead to underreporting [2]. 

Underreporting has consequential effects on public 
health response. From a policy perspective, underre- 
porting can lead to officials underestimating public 
health risk which in turn affects planning and the 
implementation of systematic control and prevention 
activities. For example, there may be a delay in imple- 
menting entry screening for travellers or inadequate 
warning to local and national health departments. 

The online version of this article is published within an Open Access environment subject to the conditions of the Creative Commons Attribution 
license <http://creativecommons.Org/licenses/by/3 .0/> . 



INTRODUCTION 

During the early outbreak of an influenza pandemic, 
rapid disease transmission can lead to exponential 
rises of influenza cases throughout the population. 
Underreporting of influenza cases in early stages 
poses problems in estimating both pandemic severity 
and transmission intensity. Underreporting stems 
from the short infectious periods of influenza infections; 
thus, individuals may recover before seeking treatment 
from their healthcare provider or before being tracked 
in a surveillance system. Asymptomatic or mild cases 
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The underestimation of incidence and pandemic 
severity can also reduce education and health notices 
to the general public about the influenza virus, causing 
the public not to take measures to protect themselves 
through vaccines, hand washing, or other control 
measures. On the other hand, the case-fatality rate 
would be overestimated as being higher than it actu- 
ally was due to the missing calculation of asympto- 
matic and mild cases from the rate denominator [3]. 
If there is an insufficient system for pandemic control, 
this situation can place unexpected, unnecessary finan- 
cial and human resource demands on a healthcare 
system. Therefore, reliable methods to estimate the 
reporting rate during early influenza epidemic out- 
breaks are critical to good public health and infectious 
disease response systems. 

The influenza A(H1N1) pandemic in Mexico in 
mid-March 2009 is one example of when country 
officials underestimated influenza incidence rates. 
Although it was not a peak season for an influenza 
outbreak, routine influenza surveillance identified an 
unexpected increase in cases of an influenza-like illness 
in mid-April 2009 [4]. An acute respiratory illness 
was discovered in two children and further confirmed 
as a new strain of H1N1 virus. Subsequently, on 
26 April 2009, the World Health Organization 
(WHO) notified the public of the new H1N1. 
Additional cases were soon discovered in the USA 
[5], and the WHO had raised the H1N1 pandemic 
alert level to phase 5 by the end of April. At the 
time, governments and the public still lacked sufficient 
knowledge about the early stages of the outbreak. 
At this time, according to H1N1 surveillance data 
from the Ministry of Health in Mexico, cumulative 
incidence was measured to be as low as 0-003% in 
Mexico's population [6]. 

Due to the increasing awareness of H1N1 through- 
out April 2009, other at-risk countries began control 
measures at border points of entry to prevent local 
epidemics. For example, thermal screening was im- 
plemented and suspected cases with a travel history 
to Mexico were monitored and some quarantined 
[3]. Because surveillance at the borders was quite tho- 
rough for influenza-like illness cases even before the 
H1N1 virus had spread globally, data on early cases 
such as time of import from the source country is rela- 
tively more complete and timely than other available 
data. For estimating the size and local expansion of 
the influenza pandemic, this is a valuable data source 
that also provides a perspective on how the disease is 
spread. 



Previous studies have demonstrated the usefulness 
of mathematical modelling in summarizing the epi- 
demiology of infectious illness and in examining im- 
pact of the diseases from the external factors [7-14]. 
In this study, a mathematical modelling approach 
was adopted to develop a method to help quantify 
the spread of infectious disease in the population. 
The method is able to estimate the incidence reporting 
rate by using the local routine surveillance data with 
estimates refined from statistics of travellers from the 
source country for an influenza pandemic. The ap- 
proach made use of the 2009 pandemic influenza A 
(H1N1) (pHlNl) outbreak as an example. 

METHODS 
Mathematical model 

We adopted a susceptible-exposed-infectious- 
recovered (SEIR) model to describe the dynamic sys- 
tem of the infectious disease [15]. For each time 
point t, a whole population is classified into one of 
four groups ('compartments'): susceptible [S(i)]; ex- 
posed [£(?)]; infectious [1(f)]; or recovered [R(i)]. 
Using S, E, I, and R to represent each compartment, 
the SEIR model has four differential equations 
describing the rates of subject movement for each 
time step: 

dS 
~di 
dE 
dT 



= -psi = -fi(S L + S T )/, 



: PSI - aE, 



dt 
dR 

d7 : 



= aE 



In this compartmental model, once a susceptible 
individual (including local residents and travellers) in 
compartment S{t) is infected, they move to compart- 
ment E(t) and remain there for the latent period. 
When that latent period is over, they move to com- 
partment /(f) during the infectious period. When the 
infectious period is over, individuals in compartment 
I(t) recover and move to compartment R(t). 5 L is 
the local susceptible size and S T is the number of trav- 
ellers from the source country. As S T is far smaller 
than S L i.e. S L » S T , we approximate 

-PSI = -P(S L + S T )/ « -PS L I. 

In the model, the probability of an individual be- 
coming infected is configured using the basic 
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reproduction number (R 0 ), the average number of sec- 
ondary infections produced by a typical infectious in- 
dividual in a wholly susceptible population. The 
transmission rate is /?, so the force of infection is fil. 
The total population size N, is equal to S + E + 1+ R 
for any time and N=S for time zero. We assumed 
the lengths of the latent period and the infectious per- 
iod follow exponential distributions and their averages 
would be 1/a and My, respectively. Adopting the line- 
arization method [16], the basic reproduction number 
R 0 is equal to /INIy. 



Parameter estimation 

In our model, we assumed homogenous mixing be- 
tween individuals in the system being studied, and 
that cases reported to officials when infectious. Since 
the numbers of asymptomatic and non-severe cases 
may not have been presented for the observed surveil- 
lance time-series data U(t), we used /,(•) to represent a 
functional form of reporting rates. Therefore, /,(•) is 
defined as (reported cases/actual cases), /,(•) = U(t)l 
aE. The aE is generated from the SEIR model. We 
considered two forms of /,(.) in the estimation: 

(1) Constant reporting rate: f t (r) = r, 

(2) Linearly increasing reporting rate: 

fSj min» P max) 

7"min, t < to 



+ (r, 



max ' mm j 



t-to 
t\ — to 



to «S t < h 



t ^ U 



For parameter estimation, we first iterated the para- 
meters by fixing their values within a grid search. 
Thus the reproduction number R 0 can be fitted 
into the SEIR model using the least-squares method. 
We then adopted the earliest times of infected 
cases imported from Mexico (T t ) and the daily 
rate of travel (m,-) to particular country i. Assuming 
the travelling cases had the same daily risk of exposure 
to the influenza virus as local cases, the average 
imported cases to a country i will be fim^t) for 
time / with m, the daily rate of travel using the fitted 
SEIR model. Assuming a Poisson event, we assume 
the probability of importing at least one case from 
the source country at time t as p L ,= (1— q,)(l — exp 
[—fim, I(t)]), where q t is the entry screening sensitivity 
for case detection of country i. Therefore, the esti- 
mated time of the first imported case can be simulated 
as fj — J2k=i k Pi,kY\j<k (! -Pij)- Iterations were 



repeated for ranges of fixed values within the grid 
search. Optimum parameters in /,(.) were obtained 
with the minimum square root of the sum of stan- 
dardized squared errors (RSE) between observed 
data (T,) and the simulated estimate times of the 
first cases imported (fj) from Mexico, i.e. 



RSE = J£ 



(T t - Td 1 



We also developed a bootstrap method to calculate 
95% confidence intervals (CIs). Supposing pairs of 
resample (T*, m*) were randomly drawn from 
the original pairs of T, and m t with replacement, the 
bootstrapped RSE for the j'th iteration of bootstrap- 
ping was: 



rseW = 



(T t - T*f 



One thousand bootstrapped RSE M was generated 
with corresponding fitted parameters. The 2- 5th and 
97 -5th percentiles of the fitted parameters were the 
lower and upper limits, respectively, of the non- 
parametric 95% CI over the 1000 samples. 

Given the estimate R 0 , we back-calculated the ex- 
ponential growth rate of the pandemic [17]: 

6 2 + 6(a + y) 



Ro = 



ay 



where 0 is the exponential growth rate. The date of 
pHlNl seeding can be calculated as 

f dateof first ^ _ ^ Qf ^ln [ t/( 1 (.)J N 



\ confirmed case 



0 



days, 



assuming an exponential growth during the early 
phase of the pandemic. 

The estimation method was implemented using 
SAS v. 9.2.1 software (SAS Institute Inc., USA). 



Materials and parameter values 

The population of Mexico (AO was 106682518 
in 2009, a figure provided by the National Council 
for Population of Mexico [18]. The pHlNl surveil- 
lance data [U(t)], shown in Figure 1, was obtained 
from the Ministry of Health of Mexico covering the 
first wave of the pandemic from 14 March 2009 to 
27 May 2009 [6]. We assumed the reporting rate 
remained constant throughout the time period and 
increased linearly. In the linear increase approach, 
the start date (to) of enhanced surveillance in 
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Table 1 . Number of travellers and earliest date of cases imported from 
Mexico to a particular country in March and April, 2009 



Destination country 


Travellers (ri) 


Earliest date (2009) 


Reference 


Canada 


101 313 


28 April 


[28] 


Spain 


65 724 


28 April 


[28] 


United Kingdom 


20513 


28 April 


[28] 


Costa Rica 


16950 


29 April 


[28] 


Germany 


35 772 


30 April 


[28] 


The Netherlands 


27640 


30 April 


[34] 


France 


61960 


1 May 


[35] 


Colombia 


24535 


3 May 


[36] 


El Salvador 


15090 


4 May 


[37] 


Argentina 


24609 


7 May 


[38] 


Brazil 


38749 


7 May 


[39] 


Cuba 


42802 


12 May 


[40] 
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Fig. 1. Confirmed cases in Mexico between 14 March 2009 and 27 May 2009. 



Mexico was 17 April 2009 [19] and the end date (h) 
was 17 May 2009. 

Traveller data including earliest dates of infected 
cases arriving from Mexico into different countries 
are shown in Table 1. We estimated the daily rates 
of travel to a particular country i (m,) by dividing 
the passenger count in March 2009 and April 2009 
by 61 days. We excluded the USA from our study be- 
cause air travel is not the only means of cross-border 
transport between the two countries. 

The epidemiological details in the parameter esti- 
mation were mostly from the previous findings of 
pHlNl. The lengths of the latent and infectious 



period S were set at 1-6 days and 1-4 days, respectively 
[20-24]. 



Sensitivity analysis 

Limited entry screening at airports at the initial stage 
of H1N1 could have led to undetected cases from 
Mexico in the early stages [25-27], especially since 
Mexico did not implement exit screening. In our 
model, in order to consider undetected cases, we tested 
results with a range of entry-screening sensitivities. 
As exact entry-screening sensitivities would vary for 
all countries, we varied the screening sensitivities by 
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Fig. 2. Values of the minimum square root of the sum of standardized squared errors (RSE) and Ro given different 
constant r. 



uniformly choosing from 30% to 100% for each of the 
countries in every simulation. 

We also performed a multivariate sensitivity analy- 
sis on the lengths of the latent and infectious periods. 
The latent period was assumed to follow a gamma 
distribution with a mean of 1-6 days and a standard 
deviation of half a day; the infectious period followed 
a gamma distribution with a mean of 1 -4 days and a 
standard deviation of half a day. 

Parameter distributions were drawn from 1000 simu- 
lations. 



RESULTS 

From our model, the value of the constant reporting 
rate (r) was estimated at 0-46% using a minimum 
value of RSE. The bootstrapped 95% CI was between 

0- 28% and 0-69% when the estimated value of Rq was 

1- 24 (Fig. 2, Table 2). The value i? 0 remained steady 
when r was >0T%. The figure demonstrated that an 
increasing reporting rate was associated with exponen- 
tial decreases in i? 0 ; thus, only fitting the surveillance 
data to the epidemic model would provide unreliable 
findings for the estimation of the r. Using these esti- 
mates, there was a 0-7% (95% CI 0-4-1-1) cumulative 
incidence in the Mexican population at the end of 
April 2009, which was the time that the pandemic 
phase 5 alert level was announced by the WHO. 



The reporting rate did not increase after 
enhanced surveillance in Mexico after mid-April 
2009, when officials stepped up surveillance systems 
(Table 2). The r min was 0-46% (bootstrapped 95% 
CI 0-27-0-68), whereas the r max was 0-47% (boot- 
strapped 95% CI 0-28-0-69). Reporting behaviour 
may not have been significantly affected during this 
short time-frame. 

In the study, we considered the sensitivity of miss- 
ing 'detections' of imported cases in the estimation 
process. The entry-screening sensitivities of countries 
were found to be moderately sensitive to our results. 
If the entry-screening sensitivities were distributed uni- 
formly between 30% and 100%, the constant r was 
estimated as 0-18% (95% CI 0-09-0-31) (Table 2). 
The value was relatively lower due to a decrease in 
the average probability of detection. If a linear trend 
was assumed, a slight increase of the reporting rate 
was observed. The rate increased from 0-10% 
(95% CI 0-03-0-26) to 0-31% (95% CI 0-11-0-80). 
However, this increasing range was insignificant and 
did not deviate much from our initial estimates. 

The impacts of variation of latent period length 
and infectious period on our results were also tested. 
As shown in Figures 3 and 4, the variation of lengths 
did not make any impact on the reporting rate esti- 
mation. The constant r was 0-44% (bootstrapped 
95% CI 0-31-0-69) and the values of the linear report- 
ing rates (r min and r max ) were both close to this value 
(Table 2). However, the variations did affect the 
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Table 2. Estimates of reporting rates (bootstrapped 95% confidence intervals) given different variations 







Linear 






Constant 






Variations 


r (%) 


'mm (%) 


''max (%) 


None 


0-46 (0-28-0-69) 


0-46 (0-27-0-68) 


0-47 (0-28-0-69) 


Entry-screening sensitivity 


0-18 (0-09-0-33) 


0-10 (0-03-0-26) 


0-31 (0-11-0-80) 


Lengths of latent and infectious periods 


0-44 (0-31-0-69) 


0-44(0-31-0-71) 


0-45 (0-32-0-72) 
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Fig. 3. The effect of variations from the length of the latent period (~gamma[mean = 1-6, s.d. =0-5]) and the length of the 
infectious period (~gamma[mean = 1-4, s.d. =0-5]) given a constant reporting rate assumption. Left panel is the box-plot of 
r and the right panel is the box-plot of R 0 . 
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Fig. 4. The effect of variations in the lengths of the latent and infectious periods given a linearly increasing reporting rate 
assumption. The impacts to r mm , r max , and R 0 are shown by the box-plots from left, middle, and right panels, respectively. 
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estimated value of R 0 ; the estimated median R 0 was 
1-24 (range 11 1-1-44) given a constant assumption 
of r. Insignificant difference was observed for the 
range of R 0 given a linearly increasing assumption. 
The range of Rq was consistent with other studies 
[20, 28, 29]. 

Given the estimates, the date of seeding for pHlNl 
was exponentially interpolated from the date of the 
first confirmed case (i.e. 14 March 2009 with an 
estimate of about 400 infections). From the results, 
the date of seeding for pHlNl was estimated as 
24 December 2011 (95% CI 17-29 December 2011) 
in order to maintain a sufficient large epidemic size 
for exportation of cases. About 5500 Mexicans were 
infected by pHlNl virus before the date of the 
first confirmed case in the surveillance data. When 
the minimum and maximum values (1-11 and 1-44, 
tespectively) of the range of Rq were adopted in 
the estimation, the dates of seeding for pHlNl were 
26 September 2011 (95% CI 12 September 2011 to 
7 October 2011) and 27 January 2012 (95% 
CI 23-30 January 2012), respectively. 

DISCUSSION 

A reliable method to estimate reporting rates during 
early phases of a new influenza pandemic is critical 
in addressing infectious disease response in the 21st 
century, especially with increased travel by air, land, 
and sea [25]. The importance of this was highlighted 
in Mexico's 2009 influenza pandemic, in which the 
reported incidence by Mexican officials (0-003%) dur- 
ing the early stages of the outbreak was not even close 
to our estimate during the early outbreak. Even 
though the strain of pHlNl virus had been further 
confirmed in mid-April 2009, the Mexican officials' 
reporting rates still did not increase. This situation 
masked the actual growth of pHlNl, leading to a 
reduction of public awareness and potentially more 
rapid disease transmission. Inaccurate estimates over- 
stating the risk can provide misleading information 
to the public and potentially raise levels of anxiety 
or panic [30]. 

A reliable estimate can assist officials at local, 
national, and global levels in planning and implement- 
ing prevention and control strategies for a pandemic 
influenza during the early stages, and better inform 
policy and protocols for other infectious disease out- 
breaks. In our study, we introduced such a method 
using existing information available to countries 
during a pandemic, the time at which imported cases 



may be arriving from a source country, to estimate 
reporting rates. According to our results, the esti- 
mated epidemic size was larger than officially reported 
in 2009; we found an estimate of 0-7% cumulative 
incidence (about 691 000 individuals) in the Mexican 
population compared to the 0-003% reported from 
the Ministry of Health of Mexico [6]. In terms of the 
epidemic size, our estimates were in line with other 
studies [2, 31] but were higher than that of Fraser 
et al. [28]. The reason for the difference is that our 
approach adopted time-series data for reported inci- 
dence, which can help better validate results when 
using traveller data. Several studies have employed a 
cross-sectional set of travel data to estimate actual epi- 
demic size but those approaches did not aim to project 
the epidemic curve or address trends of reporting 
behaviour. 

Interpolated estimation suggested the date of in- 
itiation for pHlNl was late December 2008, which 
agrees well with other studies [28] and, suggests that 
the pHlNl virus had the potential to spread to 
other continents prior to laboratory confirmation of 
the virus [29]. With the use of the SEIR model and 
the estimates, we were also able to estimate that 
around 0-005% of the Mexican population was 
infected prior to the first case being detected by the 
surveillance system. Therefore, there is a possibility 
that undetected cases from Mexico, in other countries 
before the first global case was reported, could have 
affected our estimates. By using the mathematical 
model, the probability of having imported case from 
Mexico for at least one listed country (Table 1) was 
about 0-21 (results not showed) prior to 14 March 
2009. Hence, the early 'missing' detection of imported 
cases from Mexico was not unexpected. This situation 
has similar potential issues with entry-screening sensi- 
tivities mentioned previously in the Results section, 
and we believe it would have only a minor effect on 
our findings. 

The reliability of our proposed method would 
depend greatly on the quantity and quality of travel 
surveillance available at the borders during the early 
stages of a potential pandemic. If surveillance data 
from travellers could be collected in a timely way, it 
could effectively align with the estimation of a new 
influenza pandemic size and threat. However, there 
are challenges in acquiring large samples because 
countries especially those which do not border each 
other have different and incompatible surveillance sys- 
tems as well as disparate policies on international 
reporting and collaboration. In our study, we only 



962 K. C. Chong and others 



found 12 countries that reported their confirmed cases 
with known travel history in Mexico. Regarding this 
issue of small sample size, a bootstrap method was 
the preferred choice. In the future, improved coordi- 
nation and technical innovations to streamline or 
even centralize infectious disease surveillance of 
travellers between countries would be beneficial to 
public health. 

Besides the surveillance data from travellers, routine 
serological surveys could be another source of estimat- 
ing incidence. However, compared to the surveillance 
data at borders, routine samples of seroprevalence 
may not be suitable during an initial outbreak of a 
pandemic as it requires laboratory resources and a 
longer collection time [32]. Its reliability also relies on 
the sampling frame of the data [33]. Using serial cross- 
sectional serological data along with surveillance data 
could be reliable in estimating infection rates, since 
serological data could refine parameter estimates [33]. 
In order to account for possible estimation errors, 
multi-faceted surveillance measures are recommended, 
especially for new outbreaks of influenza pandemics 
during the early stages. 

One of the advantages of using our method is its 
flexibility in adapting to/incorporating other epidemic 
models. It can be extended using similar concepts 
which adapt the reporting rate function of incidence 
in the epidemic models. For example, our approach 
could potentially be extended to demographic 
stratified models. As younger age groups were likely 
to be affected by pHlNl and to be presented in ascer- 
tainment, incorporation of demographic stratified 
models would make the modelling results more re- 
alistic. However, sufficient data is required to support 
the extension of the method. 

One of the caveats for applying the method to 
pHlNl in Mexico was the homogenous dispersion 
of infections throughout the source country [31]. 
Clearly, the pHlNl outbreak may have not yet spread 
to all cities in Mexico at the early stage. Without 
available infection data at the city level, the resolution 
of our results would not be high enough and the 
spatial variation would alter our estimates. Although 
our method provides further understanding on how 
to tackle estimates of incidence reporting rates at 
early stages of an influenza outbreak, future studies 
could explore further model extensions. 
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